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ABSTRACT 

The  goal  of  this  project  was  to  develop  an  inexpensive,  self-contained 
system  of  hardware  and  software  to  support  the  development,  administration, 
and  evaluation  of  computerized  adaptive  tests.  Toward  that  goal,  commercial 
hardware  was  selected  and  a  comprehensive  software  system  called  the 
MicroCATtm  Testing  System  was  developed.  The  MicroCAT  system  was 
implemented  in  a  local  area  network  configuration  at  the  Basic  Electricity  and 
Electronics  School  of  the  Naval  Training  Center  in  San  Diego.  It  was 
integrated  into  the  school’s  computer-managed  instruction  system  and  made 
available  to  the  University  of  Illinois  for  research  on  adaptive  diagnostic 
testing.  In  response  to  suggestions  from  users  at  this  and  other  non-government 
implementations,  the  MicroCAT  system  was  refined  into  a  marketable 
commercial  product. 
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INTRODUCTION 

Computerized  adaptive  testing  offers  a  number  of  advantages  over 
conventional  testing  including  security,  efficiency,  and  immediacy  of  results. 
However,  adaptive  tests  must  be  administered  on  a  computer,  which  can  mean 
large  expenditures  for  equipment  and  system  development.  The  overall 
objective  of  this  project  was  to  ameliorate  this  problem  by  developing  an 
inexpensive,  self-contained  system  of  hardware  and  software  for  the 
administration  of  a  wide  variety  of  tests. 

The  effort  consisted  of  two  contractually  separate  phases.  During  Phase  I, 
a  system  was  designed  to  facilitate  the  development  and  to  support  the 
administration  of  adaptive  and  conventional  computerized  tests.  The  system 
contained  extensive  facilities  for  entering  test  items,  organizing  them  into 
adaptive  and  conventional  tests,  administering  the  tests,  and  analyzing  the 
results.  The  design  was  documented  by  a  preliminary  user’s  manual. 

Phase  II  of  the  effort  had  four  objectives:  1)  to  select  and  procure 
computer  hardware  for  implementing  the  system,  2)  to  implement  on  the 
selected  hardware  the  software  system  described  in  the  preliminary  user’s 
manual,  3)  to  install  and  field  test  the  equipment  at  evaluation  sites,  and  4)  to 
evaluate  and  refine  the  system  based  on  feedback  from  the  test  sites.  Progress 
toward  each  of  these  objectives  is  described  below. 


SELECTION  OF  THE  HARDWARE 

It  was  originally  anticipated  that  the  selection  of  the  hardware  would 
proceed  in  two  stages.  First,  a  list  would  be  compiled  including  all  of  the 
computer  hardware  that  could  adequately  administer  psychological  tests.  In  the 
second  stage,  three  systems  would  be  selected  from  the  list  and  tested 
extensively.  The  evaluation  was  to  have  considered  processing  power,  clarity  of 
display,  system  reliability,  and  system  durability. 

By  the  time  the  Phase  II  contract  was  awarded,  however,  the  micro¬ 
computer  hardware  environment  had  changed  considerably.  Many  systems  on 
the  market  could  meet  the  minimum  requirements  for  psychological  testing. 
Processing  power,  display  quality,  and  durability  were  no  longer  issues 
(although  system  reliability  was  still  important).  Two  major  new  criteria  had 
appeared,  however:  adherence  to  new  industry  standards,  and  manufacturer 
longevity.  IBM  had  announced  its  personal  computer  some  months  previously, 
and  it  had  become  the  de  facto  industry  standard.  Many  small  manufacturers 
of  quality  equipment  had  gone  out  of  business,  in  part  because  of  their  lack  of 
compatibility  with  IBM  products. 

It  appeared  to  be  a  poor  investment  of  time  and  equipment  to  extensively 
evaluate  the  performance  capabilities  of  three  different  microcomputers  when 
it  was  apparent  that  factors  other  than  performance  would  determine  the 
selection.  Therefore,  the  selection  was  made  on  the  basis  of  specification 
research.  Seven  factors  were  considered  in  selecting  the  hardware:  computing 
power,  mass  storage  capacity,  graphics  capability,  networking  capability. 
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manufacturer  prominence,  separation  of  disks  from  the  display,  and 
manufacturing  site. 


Computing  power  is  essential  in  an  adaptive  testing  system  because  a 
substantial  amount  of  arithmetic  computation  must  be  performed  for  computing 
scores  as  well  as  for  selecting  items.  Experience  had  shown  that  the  Intel  8088 
microprocessor,  running  at  a  clock  speed  of  approximately  5  MHz,  was  capable 
of  performing  all  adaptive  testing  functions  in  a  single-user  testing 
environment.  Since  this  chip  had  become  something  of  a  standard  in  the 
microcomputer  industry,  acceptable  computing  power  was  loosely  defined  as 
power  greater  than  or  equal  to  that  of  the  8088. 

Systems  analysis  in  Phase  I  of  this  effort  had  suggested  that  mass  storage 
approaching  one  megabyte  would  be  required  for  adaptive  testing.  A  number 
of  computer  manufacturers  had  adopted  diskette  drives  capable  of  storing  320 
to  360  kb.  Although  it  was  somewhat  short  of  the  one-megabyte  requirement,  a 
combination  of  two  diskettes  with  a  minimum  of  320  kb  each  was  established 
as  the  minimum  standard. 

Pixel  graphics  were  required  to  represent  drawings  such  as  might  be 
encountered  in  a  test  like  the  Armed  Services  Vocational  Aptitude  Battery 
(ASVAB).  In  general,  the  higher  the  resolution,  the  better  the  picture.  A 
minimum  standard  of  graphics  resolution  was  set  at  300  pixels  horizontally  and 
200  pixels  vertically. 

The  intended  field  test  application  was  to  require  a  network  capable  of 
supporting  a  minimum  of  24  testing  stations.  The  items  would  be  kept  on  a 
hard  disk  at  one  of  the  stations  and  would  have  to  be  transmitted  to  each 
testing  station,  one  at  a  time,  upon  demand.  The  minimum  acceptable  network 
was  established  as  one  that  could  support  this  many  stations  and  transmit  data 
fast  enough  that  the  worst  case  would  not  cause  the  system  to  slow  down 
appreciably.  Some  simple  arithmetic  yielded  a  minimum  acceptable  network 
speed.  Considering  a  worst  case  in  which  all  stations  would  request  items 
simultaneously,  each  item  would  contain  one  kilobyte  of  information,  and  the 
worst  response  time  would  be  one  second,  the  network  bus  speed  had  to  be  at 
least  0.192  megabits  per  second. 

The  preceding  four  factors  were  considered  qualifier  factors;  a  system  had 
to  be  acceptable  on  all  four  to  be  considered.  The  remaining  three  were  used  to 
rank  the  acceptable  candidates. 

Prominence  referred  to  the  size  of  the  manufacturer,  the  length  of  time  the 
manufacturer  had  been  making  microcomputers  or  similar  equipment,  the 
number  of  microcomputers  the  manufacturer  had  delivered,  and  the  perceived 
probability  that  the  manufacturer  would  continue  to  make  microcomputer 
equipment.  This  factor  was  considered  important  because  it  is  difficult  to 
obtain  maintenance  support  for  equipment  that  is  no  longer  being 
manufactured  or  that  was  developed  by  a  company  that  is  no  longer  in 
business. 
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The  ability  to  separate  the  diskette  drives  from  the  display  and  response 
device  was  considered  important  because  there  was  some  concern  that 
examinees  might  put  things  into  the  diskette  drives  if  they  were  openly  visible 
and  accessible.  This  would  be  especially  important  in  a  hostile  environment 
that  might  surround  the  administration  of  some  psychological  tests. 

The  final  factor,  manufacturing  site,  was  important  because  of  government 
procurement  regulations  that  might  require  some  potential  users  to  buy 
American-made  equipment. 

Four  microcomputers  were  considered  acceptable  on  all  four  qualifier 
factors.  These  were  the  IBM  PC,  the  Texas  Instruments  Professional  Computer, 
the  Xerox  16/8,  and  the  WICAT  S- 1 50.  Of  these,  the  IBM  PC  was  ranked  the 
highest.  It  differed  from  its  two  closest  competitors  (the  Texas  Instruments 
Professional  and  the  Xerox  16/8)  only  in  the  prominence  of  the  company  as  a 
manufacturer  of  computer  equipment. 

A  final  configuration  was  designed  around  the  IBM  PC  and  consisted  of  a 
network  of  testing  stations  communicating  with  two  network  servers.  The 
connecting  network  selected  was  the  3COM  Ethernet  network.  This  network 
was  selected  because  it  was  the  only  commercially  available  network  that  met 
the  specifications  and  could  be  serviced  on  a  national  basis  along  with  the 
computer  equipment.  The  testing  stations  were  configured  as  single-diskette 
computers.  The  servers  were  IBM  PC-XT  computers,  each  having  a  hard  disk 
and  a  diskette  drive. 

Bids  were  then  solicited  from  all  vendors  who  could  supply  and  maintain 
the  equipment  as  required.  Maintenance  was  a  difficult  requirement  because, 
although  the  equipment  was  being  purchased  in  Minnesota,  all  that  was  known 
at  the  time  about  its  ultimate  location  was  that  it  would  not  be  Minnesota. 
Therefore,  the  vendor  had  to  have  a  national  maintenance  network  in  place. 
Only  two  companies  were  able  to  respond  at  the  time  the  bid  was  requested: 
Computerland  and  Sears.  (IBM  could  not  respond  because  the  3COM  network 
was  not  an  IBM  product.)  Computerland  won  the  bid  on  the  basis  of  its  lower 
price. 

Four  computers  were  purchased  immediately  and  assembled  into  a  small 
version  of  the  future  testing  network.  The  remaining  computers  were 
purchased  later  in  the  project  when  they  were  needed. 


IMPLEMENTATION  OF  THE  SOFTWARE 

Although  the  basic  design  of  the  software  was  completed  during  Phase  I  of 
the  effort  and  much  of  the  software  had  been  written  at  private  expense 
between  the  project’s  two  phases,  substantial  design  and  augmentation  were 
required  for  the  final  system.  The  field  test  application  was  selected  early  in 
the  project:  the  system  would  be  used  at  the  Basic  Electricity  and  Electronics 
(BE&E)  School  at  the  Naval  Training  Center  (NTC)  in  San  Diego.  It  would  be 
used  to  implement  new  forms  of  diagnostic  testing  being  developed  at  the 
University  of  Illinois. 


Meetings  with  Navy  and  University  of  Illinois  personnel  early  in  this  phase 
of  the  project  revealed  two  deficiencies  in  the  system.  First,  it  had  no  graphics 
capabilities.  Graphics  would  be  necessary  to  display  the  electronics  items  that 
would  be  administered  in  the  BE&E  School.  The  second  deficiency  was  that 
the  system  could  not  specify  tests  using  the  new  diagnostic  algorithms  that  were 
being  developed.  To  solve  this  problem,  it  was  agreed  that  a  custom  interface 
would  be  added  to  the  system  so  that  procedures  to  implement  these  new 
techniques  could  be  developed  in  FORTRAN  or  Pascal. 

The  majority  of  the  design  specified  in  Phase  I  had  been  implemented  on  a 
PDP  11  minicomputer.  Software  development  for  Phase  II  began  by 
transferring  these  programs  to  the  IBM  personal  computers  and  modifying  them 
as  necessary.  In  general,  this  was  not  a  difficult  task.  The  major  changes  were 
in  version-specific  Pascal  differences  and  operating-system-specific  function 
calls. 

An  initial  version  of  a  graphics  editor  was  designed  and  developed. 

Several  preliminary  versions  were  delivered  to  the  University  of  Illinois  for 
evaluation.  The  final  version  allowed  colored  drawings  to  be  developed 
interactively  on  the  IBM  PC  using  either  a  mouse  or  the  arrow  keys  for  cursor 
movement. 

The  design  of  the  test  development  software  provided  for  an  authoring 
language  to  develop  the  tests  and  a  compiler  to  translate  the  authoring  language 
into  a  form  that  could  be  executed  quickly.  In  the  version  developed  for  the 
IBM  PC,  the  compiler  also  bit-maps  and  compresses  the  graphics  items.  While  it 
might  take  as  much  as  a  minute  for  the  computer  to  display  an  item  using  the 
graphics  commands,  the  compressed  bit-mapped  version  can  be  displayed  in  less 
than  half  a  second. 

The  entire  software  system  developed  was  described  in  the  final  User's 
Manual  for  the  MicroCAT  Testing  System,  distributed  as  Research  Report  ONR- 
85-1  (Assessment  Systems  Corporation,  1984).  This  manual  contains  an  overview 
of  computerized  adaptive  testing  and  discusses  the  many  features  of  the 
MicroCATtm  Testing  System  in  four  sections  corresponding  to  the  four 
MicroCAT  subsystems. 

The  section  on  the  Development  Subsystem  describes  the  Graphics  Item 
Banker,  the  font  generator,  creating  tests  from  predefined  test  templates,  and 
the  test  compiler.  The  section  on  the  Examination  Subsystem  describes  how  to 
administer  tests.  The  Assessment  Subsystem  section  describes  programs  for 
collecting  data  from  several  administrations  into  a  common  file,  performing 
conventional  item  analyses,  estimating  item  response  theory  (IRT)  item 
parameters,  evaluating  the  adaptive  potential  of  an  item  pool,  and  computing 
test  validity  coefficients.  Finally,  the  section  on  the  Management  Subsystem 
describes  programs  that  allow  a  network  of  testing  stations  to  be  managed  from 
a  single  proctoring  terminal. 

The  User’s  Manual  also  describes  the  practical  details  of  the  authoring 
language,  MCATL  (Minnesota  Computerized  Adaptive  Testing  Language). 
Further  details  about  this  authoring  language  are  provided  in  Research  Report 


ONR-85-3,  MCATL:  A  Language  for  Authoring  Computerized  Adaptive  Tests 
(Vale,  1985b).  This  report  describes  the  rationale  for  the  development  of 
elements  of  the  language  as  well  as  its  formal  specification. 

Research  Report  ONR-85-4,  ASCAL:  A  Microcomputer  Program  for 
Estimating  Logistic  IRT  Item  Parameters  (Vale  &  Gialluca,  1985),  describes  the 
technical  details  of  ASCAL  (for  Assessment  Systems  CALibration),  the  IRT 
parameter  program  included  in  MicroCAT.  ASCAL  uses  an  algorithm  very 
similar  to  the  industry-standard  calibration  program  LOGIST  (Wingersky, 
Barton,  &  Lord,  1982).  It  differs  from  LOGIST  in  that  it  runs  on  a 
microcomputer  and  uses  Bayesian  prior  distributions  on  several  parameters. 
When  it  is  run  on  an  IBM  PC  with  an  8087  math  coprocessor,  it  performs  a 
calibration  of  reasonable  size  (e.g.,  35  items  and  3,000  subjects)  in  a  reasonable 
amount  of  time  (e.g.,  less  than  two  hours).  When  it  is  run  without  the 
coprocessor,  the  same  calibration  may  take  24  hours. 


FIELD  TEST  OF  THE  SYSTEM 

Implementation  of  the  MicroCAT  system  at  the  BE&E  School  began  in  June 
of  1984.  A  system  consisting  of  15  testing  stations,  two  network  servers,  and 
one  proctoring  station  was  assembled.  Several  tests  from  the  BE&E  curriculum 
were  implemented  on  the  system  for  initial  system  evaluation. 

The  entire  system  was  interfaced  to  MIISA,  the  mainframe  computer  in 
Memphis,  Tennessee,  which  manages  all  of  the  instruction  at  NTC.  To  avoid 
reprogramming  of  MIISA  (a  task  considered  nearly  impossible  by  NTC),  the 
testing  system  was  made  to  look  like  a  GE  Terminet  terminal,  from  which 
MIISA  was  accustomed  to  receiving  test  results.  Thus,  MIISA  was  told  to  expect 
a  new  Terminet  in  the  testing  room,  and  the  testing  network  was  connected. 
This  technique  worked  very  well;  the  connection  allowed  the  testing  network  to 
get  test  assignments  from  MIISA,  and  MIISA  to  get  test  results  from  the 
network.  The  only  problem  with  this  connection  was  that  when  MIISA  failed, 
no  new  tests  could  be  initiated  until  it  was  fixed.  MIISA  was  the  only  non- 
redundant  component  in  the  testing  system. 

Details  of  the  NTC  implementation  are  described  in  Research  Report  ONR- 
85-2,  Implementation  of  a  Microcomputer-Based  Testing  System  in  a  Military 
Training  Environment  (Vale,  1985a).  This  report  provides  details  of  how  the 
MicroCAT  system  was  adapted  to  the  NTC  implementation. 

In  addition  to  the  NTC  implementation,  several  MicroCAT  systems  were 
distributed  to  non-government  users  for  use  and  evaluation.  While  the  NTC 
implementation  provided  volume  tests  of  the  simple  parts  of  the  MicroCAT 
system,  these  other  sites  provided  tests  of  the  more  advanced  features  of  the 
system. 


EVALUATION  AND  REFINEMENT  OF  THE  SYSTEM 


As  the  system  was  implemented  at  the  evaluation  sites,  it  was  put  to  use 
almost  immediately.  In  the  early  implementations,  occasional  bugs  were  found 
in  the  system.  These  were  corrected  as  they  were  found. 

More  frequently,  however,  requests  came  for  additional  features  in  the 
system.  The  NTC  implementation  generated  most  of  the  initial  requests.  These 
included  a  request  to  allow  the  examinee  to  skip  items  early  in  the  testing 
process  and  then  return  and  answer  them  later.  This  feature  was  omitted 
originally  because  it  is  incompatible  with  adaptive  testing.  However,  it  is  an 
important  feature  when  the  MicroCAT  system  is  used  for  conventional  testing. 

Another  feature  that  was  implemented  in  response  to  requests  from  the 
field  was  the  inclusion  of  high-resolution  text-only  items.  The  original  system 
was  intended  only  for  medium-resolution  graphics  items.  The  addition  of  this 
feature  made  a  wider  range  of  textual  items  possible. 

Other  features  have  been  suggested  and  will  be  implemented  in  the  future. 
Split-screen  text  items,  in  which  a  reading  passage  scrolls  in  the  top  of  the 
screen  while  a  question  remains  stationary  in  the  lower  portion  of  the  screen, 
have  been  partially  implemented.  Other  features  that  may  also  be  implemented 
include  a  hard-copy  item  banker  and  random  item  selection  from  a  domain. 


FUTURE  PLANS 

The  MicroCAT  Testing  System,  which  was  designed  and  refined  in  this 
project,  is  now  a  commercial  software  product.  Although  it  was  initially 
intended  for  a  relatively  small  group  of  users  (i.e.,  those  who  wanted  to 
implement  adaptive  tests),  it  appears  that  the  market  is  expanding.  Several 
good  suggestions  obtained  during  the  course  of  the  contract  will  be 
implemented  as  revenues  allow. 

In  its  current  state,  MicroCAT  is  a  well-tested,  stand-alone  adaptive  testing 
system  capable  of  administering  a  variety  of  adaptive  tests.  Since  its  support  is 
now  commercial,  the  additions  that  will  be  made  first  are  those  most  in  demand 
in  the  market.  Specifically,  since  the  education  community  appears  to  be  one 
of  the  most  promising  markets,  features  such  as  sampling  items  from  a  domain, 
split-screen  text  items,  and  conventional  item-banking  capabilities  will  be  added 
first.  As  revenues  allow  and  research  suggests,  new  item  types  and  testing 
strategies  will  also  be  added. 
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